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Abstract.  Lanczos  vectors  computed  in  finite  precision  arithmetic  by  the  three- 
term  recurrence  tend  to  lose  their  mutual  biorthogonality.  One  either  accepts  this 
loss  and  takes  more  steps  or  re-biorthogonalizes  the  Lanczos  vectors  at  each  step. 
For  the  symmetric  case,  there  is  a  compromise  approach.  This  compromise,  known 
as  maintaining  semi-orthogonality,  minimizes  the  cost  of  re-orthogonalization.  This 
paper  extends  the  compromise  to  the  two-sided  Lanczos  algorithm,  and  justifies  the 
new  algorithm. 

The  compromise  is  called  maintaining  semi-duality.  An  advantage  of  maintaining 
semi-duality  is  that  the  computed  tridiagonal  is  a  perturbation  of  a  matrix  that  is 
exactly  similar  to  the  appropriate  projection  of  the  given  matrix  onto  the  computed 
subspaces.  Another  benefit  is  that  the  simple  two-sided  Gram-Schmidt  procedure  is 
a  viable  way  to  correct  for  loss  of  duality. 

Some  numerical  experiments  show  that  our  Lanczos  code  is  significantly  more 
efficient  than  Arnoldi’s  method. 
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1  Introduction 

For  non-Hermitian  matrices  approximate  eigenvalues  from  the  (2-sided)  Lanczos  pro¬ 
cess  are  much  more  accurate  (for  the  same  elapsed  time  and  starting  vectors)  than 
those  from  the  Arnoldi  method.  Consequently  it  is  important  to  implement  the  Lanc¬ 
zos  algorithm  as  well  as  possible.  This  paper  summarizes  the  analysis  in  [7]  and  claims 
to  show  the  best  (or  nearly  best)  way  to  do  it. 

This  article  shows  that  it  is  not  necessary  to  re-biorthogonalize  the  Lanczos  vec¬ 
tors  at  every  step  in  order  to  approximate  the  behavior  of  the  algorithm  in  exact 
arithmetic.  A  property  of  the  computed  Lanczos  vectors  called  semi-duality  may 
be  imposed  (defined  in  §3)  and  suffices  for  keeping  close  to  the  exact  algorithm  at 
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minimal  cost.  Semi-duality  is  less  expensive  to  maintain  than  duality,  and  equally 
effective.  Having  stated  our  contribution  we  resume  the  introduction. 

Krylov  subspace  methods  determine  a  useful  basis  for  the  Krylov  subspace 

B)  =  span {q,  Bq , ...,  B'^q). 

The  eigenvalues  of  certain  projections  of  B  onto  K'(q,B)  serve  as  approximations  to 
eigenvalues  of  B.  The  eigenvalues  of  projections  of  B  are  often  called  Ritz  values. 
They  are  not  Raleigh-Ritz  approximations,  except  in  the  Hermitian  case,  but  we  do 
not  have  a  better  name  for  them. 

The  most  popular  Krylov  subspace  method  for  non-Hermitian  matrices  is  the 
Arnoldi  algorithm  [26].  An  orthonormal  basis  of  JCx(q ,  B)  is  computed.  The  orthogo¬ 
nal  projection  of  B  onto  K'(q.B)  is  represented  by  an  i— by— i  Hessenberg  matrix. 

The  non-Hermitian  or  two-sided  Lanczos  algorithm  is  another  Krylov  subspace 
method.  Given  two  starting  vectors  p *  =  p\  and  q  =  q1,  the  two-sided  Lanczos 
algorithm  computes  at  the  same  time  a  basis  for  the  right  Krylov  subspace,  V(q,B ), 
and  a  dual  basis  for  the  left  Krylov  subspace 

AT(pM,R)  =  span(p’,  ...,p*Bl~1). 

The  Lanczos  algorithm  computes  the  partial  reduction  of  B  to  tridiagonal  form. 

In  exact  arithmetic  the  Hermitian  Lanczos  algorithm  determines  a  matrix  of  or¬ 
thogonal  vectors,  Q,  while  the  two-sided  Lanczos  algorithm  determines  two  matrices 
P  and  Q  such  that  P~Q  is  diagonal.  This  relation  among  the  Lanczos  vectors  is  often 
called  biorthogonality  [15]. 

Lanczos  vectors  computed  in  finite  precision  arithmetic  by  the  three-term  recur¬ 
rence  tend  to  lose  their  mutual  biorthogonality.  Two  ways  to  compensate  for  this 
phenomenon  are  known;  Lanczos  with  full  re-bi-orthogonalization  (LanFRB)  [4]  and 
acceptance  of  the  loss  of  biorthogonality  which  forces  more  steps  to  be  taken  [6.  10]. 
For  the  Hermitian  case,  a  compromise  is  known  [11,  18,  19,  27,  28].  This  compromise, 
known  as  maintaining  semi-orthogonality,  minimizes  the  number  of  reorthogonaliza- 
tions.  This  article  extends  the  compromise  to  the  two-sided  Lanczos  algorithm,  and 
justifies  the  new  algorithm. 

There  are  known  cases  when  the  simple  recurrence  takes  extreme  amounts  of  time 
[10].  On  the  other  hand  LanFRB  is  expensive  for  a  long  run.  The  compromise  we 
present  in  this  article  is  better  than  either  of  the  two  extremes. 

Better  approximations  to  eigenvalues  of  B  tend  to  be  computed  from  a  single 
Krylov  subspace  of  dimension  one  hundred  than  from  four  Krylov  subspaces  of  di¬ 
mension  twenty  five.  To  take  advantage  of  this  property,  we  want  to  use  large  Krylov 
subspaces.  The  amount  of  data  transfer  (  from  memory  to  the  computational  unit  ) 
required  in  Lanczos  with  full  rebiorthogonalization  when  n  is  large  is  significant.  In 
this  respect,  the  compromise  is  at  least  twice  as  fast  as  maintaining  full  biorthogo¬ 
nality.  Usually  it  is  much  faster. 
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The  state  of  the  art  in  Lanczos  methods  is  to  select  from  one  of  four  algorithms. 
The  user  first  selects  either  the  three-term  recurrence,  or  LanFRB.  This  choice  is  a 
trade-off  between  the  low  cost  per  step  of  the  three-term  recurrence  and  the  limited 
number  Lanczos  steps  taken  by  LanFRB.  Then  the  user  selects  an  implementation 
with  or  without  Look-Ahead  [10,  21,  20].  Look-Ahead  enhances  stability  while  in¬ 
creasing  cost  modestly.  This  article  does  not  consider  implementations  with  the 
Look-Ahead  feature. 

The  word  re-orthogonalization  in  the  Hermitian  case  is  ugly  enough  but  the  anal¬ 
ogous  term  rebiorthogonalization  goes  too  far  (nine  syllables).  So  we  seek  a  term  with 
fewer  syllables.  In  Functional  Analysis  row  vectors  represent  linear  functionals  and 
the  property  p*q:  =  6ij  (Kronecker’s  delta)  says  that  the  ordered  sets  {p*,...,p*}  and 
{qi....,qj}  are  a  pair  of  dual  bases  for  K'(q,B)  and  )Cl(p*,B).  So  we  use  the  term 
dual  instead  of  biorthogonal.  Consequently  we  speak  of  maintaining  duality,  local 
duality,  and  semi-duality  (introduced  in  §3). 


1.1  Summary 

Our  results  extend  earlier  work  done  in  the  Hermitian  case  [27,  28],  but  new  issues 
arise  in  the  non-Hermitian  case.  In  exact  arithmetic  the  Lanczos  algorithm  determines 
a  tridiagonal-diagonal  pencil  (T,  SI)  such  that  SI-1!1  is  similar  to  the  projection  of  B 
onto  the  spans  of  the  Krylov  subspaces.  See  Definition  2.1.  The  diagonal  elements 
of  fl  are  defined  to  be  the  inner  products  of  consecutive  pairs  of  normalized  Lanczos 
vectors.  In  exact  arithmetic  the  algorithm  breaks  down  if  Q  =  diag(u.';)  is  singular. 
In  finite  precision  arithmetic  breakdowns  are  rare,  but  near  breakdowns  are  not.  It 
is  tempting  to  require  that  |cj,  |  >  -v/e,  where  e  is  the  round  off  unit,  but  our  rather 
lengthy  analysis  shows  that  the  algorithm  is  still  viable  provided  that  |u>j|  >  je.  Below 
that  level  the  accuracy  of  the  Ritz  values  does  not  generally  improve  if  the  recurrence 
continues. 

The  remainder  of  this  work  is  organized  as  follows.  Section  2  contains  a  discussion 
of  what  is  known  about  solving  eigenvalue  problems  using  the  two-sided  Lanczos  pro¬ 
cess.  The  basic  properties  of  the  Lanczos  algorithm  are  reviewed,  the  implementation 
of  the  three  term  recurrences  is  outlined,  convergence  theory  is  discussed,  and  our 
practical  experience  with  implementations  of  the  three  term  recurrence  is  summa¬ 
rized. 

That  done,  we  move  on  to  the  tricky  issue  of  when  to  “re-bi-orthognalize,,  or 
correct  the  Lanczos  vectors  to  restore  duality.  The  candidate  Lanczos  vectors  are 
computed  by  the  three  term  recurrence,  but  at  certain  steps,  the  loss  of  duality  of 
the  candidate  Lanczos  vectors  to  the  previous  Lanczos  vectors  is  “too  large",  and 
then  we  correct  them  to  obtain  the  final  Lanczos  vectors.  To  obtain  a  competitive 
algorithm,  correction  steps  are  implemented  just  like  a  step  of  the  Lanczos  algorithm 
with  full  re-bi-orthogonalization  (LanFRB).  Since  the  duality  of  the  Lanczos  vectors 
is  not  maintained  to  full  precision,  this  process  must  be  justified.  The  viability  of  the 


two  sided  Gram-Schmidt  process  is  established  in  §3. 

In  §4  the  properties  of  the  Lanczos  algorithm  with  correction  are  developed.  In 
§4.4  we  show  how  to  monitor  the  loss  of  duality  among  the  computed  Lanczos  vectors 
without  significantly  increasing  the  cost  of  the  algorithm.  For  efficiency  the  correction 
steps  must  be  invoked  as  rarely  as  possible  consistent  with  maintaining  accuracy  in 
the  approximations. 

In  §5  we  prove  that  an  added  advantage  of  maintaining  semi-duality  is  that  the 
computed  pencil  (T,  fl)  is  a  perturbation  of  a  pencil  that  is  exactly  equivalent  to  the 
projection  of  the  operator  onto  the  computed  subspaces.  The  norm  of  the  perturba¬ 
tion  is  as  small  as  the  data  warrants.  §6  illustrates  some  of  our  results  with  some 
challenging  numerical  examples. 

2  Two-Sided  Lanczos 

The  two-sided  Lanczos  algorithm  is  based  on  the  partial  reduction  of  a  matrix  B 
to  tridiagonal  form.  The  Lanczos  algorithm  starts  from  an  arbitrary  pair  of  vectors 
p*  —  p\  and  q  —  qi.  After  j  successful  steps,  the  matrices  P*  and  Qj  are  produced. 
The  rows  of  P*  span  the  Krylov  subspace  JCJ(p*,B)  and  the  columns  of  Qj  span 
K,J(q,  B ).  The  matrix  Tj  =  P*BQj  is  tridiagonal;  Qj  =  P*Qj  is  diagonal. 

Early  implementations  scale  the  Lanczos  vectors  so  that  Q  —  I  [1,  35],  but  more 
recently  the  advantages  of  maintaining  unit  length  of  all  the  Lanczos  vectors  has 
gained  acceptance  [4,  10.  21],  and  this  is  what  we  do.  A  useful  result  of  our  work  is 
that  Q  can  become  nearly  singular,  cond(fi)  =  0(e),  without  spoiling  of  the  algorithm. 

The  eigenvalues  of  an  oblique  projection  of  B  are  used  to  approximate  the  eigen¬ 
values  of  B. 

Definition  2.1  Let  Qj  =  [qi,...qj\  have  full  rank,  let  P*  =  [pi,...pj]*  also  have  full 
rank.  If  P* Qj  is  invertible  then 


n,  =  QjQ~1p*, 

is  a  projector  (II|  =  Ilj).  It  is  not  orthogonal  (II*  ^  II,).  We  say  that  Ilj  is  an 
oblique  projector  onto  Range  (Qj).  It  is  also  an  oblique  projector  onto  the  dual  space 
{u*Ilj  :  u  e  C-7'}  =  Range(Pj)*.  Thus  YljBYlj  is  a  projection  oj  B  onto  the  pair 
Range  (Qj)  and  Range  (Pj)*  ■ 

Assuming  that  Qn  and  Pf  exist  (that  is  the  algorithm  does  not  break  down) 
the  representation  of  B  with  respect  to  the  basis  {qi,...,qn}  is  Q~lBQn.  IR  =  I, 
which  implies  that  Q~l  =  fR1  P*  and  Q~lBQn  =  Q~lTn.  The  tridiagonal  Q~lTj 
represents  TijBYlj  in  the  dual  bases  {qi,...,qj}  and  {o»f lp\, ..., uj~lp*}.  Similarly  the 
representation  corresponding  to  P*  and  QjQf 1  is  TjQj1. 
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2.1  The  Three  Term  Recurrences 

The  Lanczos  vectors  satisfy  a  pair  of  three-term  recurrences 


(2.2) 

and 


(2.3) 


Qti 


Pi+iP*+i  =  P\B  -  -fp* 

Ui 


l^iv* 

Pi- D 

^j-i 


D  Oi{  Pi^i 

qi+1 7i+i  =  Bqt  -  qt - q,-i - 

C^i—l 


The  coefficients  are  chosen  so  that  the  right  hand  side  of  (2.2)  annihilates  qu  ...,qi 
and  the  right  hand  side  of  (2.3)  is  annihilated  by  p{,...,p*.  The  P s  and  7s  come  from 
the  normalizing  convention.  The  recurrence  stops  if  /3j+1o;J+i7j+i  =  0.  With 


(  P2U2 , 


Tj  :=  tridiag 


,  PjOJj 


Oil, 


,a3  I  , 


V 


72^2,  •••  ,7 jUj 


equations  (2.2)  and  (2.3)  may  be  written  in  compact  form 
(2.4)  7B-T3.n-17  =  ejft+!P-+1, 


and 


(2.5) 


BQ J  -  Qjfij  lTj  =  qj+  i7j+ie‘, 


where  e;  =  (0, ...,  0, 1)*. 


2.2  Ritz  Triplet  Convergence 

The  eigenvalues  of  B  are  approximated  using  the  eigenvalues  of  the  pair  (Tj,  f ij)  for 
increasing  j.  From  the  eigen-triplet  (uP*,  dP ,  vP), 

u\^*Tj  =  0PuP*Q}  and  TjV P  =  QjvPoP, 

form  the  Ritz  triplet  {xP* ,  6\3\  yP)  given  by 

(2.6)  xP*  =  upT*  and  yP  —  QjvP. 

Ritz  triplets  approximate  eigen-triplets  of  B.  When  no  confusion  will  arise  we  omit 
the  super-script  (j). 

The  expression  x*x  (2.5)  xu*  reduces  to 

x*Byi  =  6lx*yi. 
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In  words,  #;  is  the  generalized  Rayliegh  quotient  corresponding  to  x*  =  u*P*  and 
Hi  =  QjVi .  See  §11  of  [17]  for  a  discussion  of  generalized  Rayliegh  quotients. 

We  now  list  what  is  known  about  the  approximations  derived  from  the  first  j  steps 
of  the  algorithm. 

Multiply  (2.5)  by  vl  and  substitute  (2.6)  to  obtain 
(2-7)  Byi  UiOi  —  qj+iPj-\-i'Vi(j') . 

The  remarkable  property  of  (2.7)  is  that  the  right  hand  side  can  be  computed  without 
forming  y;.  Even  in  exact  arithmetic  \\yi\\2  can  be  smaller  than  ||uj||2.  Thus  a  small 
value  of  |n2(j)|  is  a  necessary  though  not  sufficient  indication  that  #;  be  close  to  an 
eigenvalue  of  B. 

The  perturbation  theory  for  the  eigenvalue  problem  is  more  complicated  than  in 
the  Hermitian  case.  The  Lanczos  algorithm  eventually  yields  approximate  eigentriples 
(x*,9,y)  where  ||:r*||2  =  1  =  ||y||2  such  that  the  corresponding  residuals  \\x*(B  —  9I)\\2 
and  ||  (B  —  9I)y\\2  are  small.  Such  triples  exactly  solve  a  nearby  eigenvalue  problem 
[13].  The  good  thing  is  that  the  eigenvalues  of  £I“17)  for  which  the  residual  norms 
are  small  persist  as  approximate  eigenvalues  of  fl^Tk  for  k  >  j  [13]. 

The  residual  norm  \\{B  —  #j/)yjj|2  is  a  pessimistic  estimate  of  the  accuracy  of 
and  a  good  estimate  of  the  accuracy  of  yl.  The  accuracy  of  generalized  Rayleigh 
quotients  is  proportional  to  the  product  of  the  residual  norms.  To  be  precise,  if 
(. x*.9i,yi )  approximates  an  eigentriple  of  B  that  is  well  separated  (see  Theorem  2.1 
in  §5  of  [30]),  then  the  accuracy  of  6t  is  proportional  to 


This  product  divided  by 

gap (0i,Tj)  =  mini#*  -  6k\ 

k^i 

is  a  realistic  backward  error  estimate  for  9t  [3]. 

One  sided  algorithms,  and  in  particular  the  Arnoldi  algorithm,  do  not  enjoy  this 
property. 

To  factor  the  exact  shrinkage,  ||x||2/|M|2  and  H2/H2/IMI2,  into  the  error  estimates 
to  obtain  asymptotic  error  bounds,  one  must  first  compute  the  Ritz  triplets.  For 
an  n-by-n  real  operator,  B.  after  j  Lanczos  steps  the  number  of  real  floating  point 
operations  (flops)  required  to  compute  the  matrices  of  left  and  right  eigenvectors  for 
m  Ritz  values  is  8 nmj.  This  is  often  more  flops  than  are  required  for  the  Lanczos 
run. 

Fortunately  a  realistic  lower  bound  on  the  shrinkage  is  available  if  the  duality  of 
the  computed  Lanczos  vectors  is  maintained.  In  this  case  y  =  QjV  satisfies  P*y  = 
CljV.  Also  since  the  Lanczos  vectors  are  normalized  to  have  unit  Euclidean  length, 
||Pj  H2  <  y/j.  Combine  these  two  equations  to  find 


IM|2> 


IMI2  > 


PjVh 

Vj 


V7 
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Similarly  if  x*  =  u*P*,  then  ||x*||2  >  \\u*Clj\\2/y/j. 

2.3  Practical  Experience 

Without  careful  observation,  there  is  no  science.  This  section  discusses  surprising 
behavior  that  has  been  consistently  observed  in  large  scale  scientific  computations 
using  the  non-Hermitian  Lanczos  algorithm  [10].  In  the  case  p\  —  qi, 

•  The  sequence  {|u;j|}i>o  tends  to  be  decreasing,  sometimes  precipitously. 

•  Even  when  \ujj\  has  declined  to  nearly  lOne,  approximate  eigenvalues,  eigenvec¬ 
tors,  and  solutions  to  linear  systems  continue  to  converge. 

•  Even  when  \u3\  has  declined  to  nearly  10 ne,  (7),  Q,j)  is  almost  always  graded  so 
that  the  growth  factor 

(2.8)  =  maxdlTjfi^Hoo,  ||ft717j||1)/||JB||2 

is  of  order  unity,  say  less  than  10. 

As  usual,  e  denotes  the  machine  precision.  The  scalar  3>j  measures  the  relative 
size  of  the  intermediate  quantities  introduced  during  the  Lanczos  algorithm.  It  is 
essential  to  distinguish 

\\SIJ%  =  l/minjcjil 

from  Tj.  The  quantity  HQjdb  can  be  large,  nearly  e_1,  without  necessarily  effect¬ 
ing  the  accuracy  of  the  eigenvalues  and  eigenvectors.  The  error  in  computing  the 
eigenvalues  of  is,  among  other  things,  proportional  to  ||fI“1TJ-||i.  But  in  the 

rare  case  that  is  large,  the  Lanczos  algorithm  is  unstable  due  to  the  introduction 
of  large  intermediate  quantities.  The  approximate  eigenvalues  suffer  perturbations 
like  ed>j||B||2.  In  this  case,  Look-Ahead  Lanczos  is  recommended  [10,  20].  When 
=  0(1)  ,  we  expect  our  approximations  to  be  as  accurate  as  the  data  warrants. 


3  The  Viability  Of  The  Two-Sided  Gram-Schmidt 
Process 

This  section  studies  the  central  problem  of  how  to  maintain  adequate  duality  between 
the  two  sequences  of  Lanczos  vectors  {p{,...,p*}  and  {gi, ...,  at  a  reasonable  ex¬ 
pense.  It  will  help  to  recall  the  corresponding  technical  problem  in  the  symmetric 
case  when  Pi  =  qi,  i  =  1, j.  See  [22,  19,  27,  28].  Suppose  that  the  three-term  recur¬ 
rence,  in  finite  precision  arithmetic,  returns  a  unit  vector  q'J+l  that  is  not  orthogonal 
to  the  previous  qu  i.e.  Q)q'j+ 1  is  not  negligible.  The  Gram-Schmidt  process  replaces 
q'j+1  by  a  normalized  version  of  (I3  -  QjQ*)q'j+1.  This  procedure  is  appropriate  if 
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QjQj  =  Ij,  but  can  actually  makes  things  worse  if  Qj's  columns  are  not  orthogonal. 
The  interesting  question  here  is  how  much  can  || Ij  -  Q*Qj ||2  be  permitted  to  grow 
and  yet  guarantee  that  (Ij  —  Q  jQ"-)q'j+l  is  orthogonal  to  range(<5j)  to  within  working 
accuracy. 

Our  problem  is  similar  but  more  complicated.  The  formal  two  sided  Gram- 
Schmidt  operator  is  Ij  -  Qjttj'P*  where  %  =  diag (P*Qj).  How  large  can  we  permit 
\\Ij  -  lb  to  grow  and  yet  get  what  we  want  by  applying  Ij  -  QJQ~1P J  to 

(p'+iT  and  9'+1? 

The  answer  in  the  symmetric  case  is  that  semi-orthogonality  suffices.  More  pre¬ 
cisely,  if 


(3--0  II Qlli+i  111  <  >/e 

holds  for  i  =  1  —  1  then  if  ||Qj^'+1||i  <  \fl  we  may  take  qJ+i  =  q'j+1,  but  if 

<  WQ^j+i  ||i  <  lOOyT  then 


&+i  =  ih  -  QjQjWj+i 

satisfies 

ll<5j?i+i||1<100(j-l)e<v^ 

provided  lOOj y/e  <  1;  a  mild  constraint  on  the  length  of  Lanczos  runs.  For  example,  in 
single  precision  it  is  dangerous  to  make  runs  of  over  100  steps.  If  ||<?“?j-+1||i  >  lOOvT. 
we  compute  q}  to  replace  q3.  and  then  recompute  q']+1 .  This  requires  an  extra  matrix- 
vector  multiplication. 

We  shall  give  a  similar  condition  in  the  two-sided  case;  semi-duality  suffices.  How¬ 
ever  the  definition  of  semi-duality  is  not  as  simple  as  in  the  symmetric  case  and  we 
postpone  it  until  more  notat  ion  has  been  developed.  An  added  benefit  from  maintain¬ 
ing  semi-duality  is  that  the  computed  tridiagonal-diagonal  pair  of  Lanczos  matrices 
(Tj,Qj)  is  equivalent,  to  within  roundoff  error,  to  the  true  “projection”  of  B .  namely 
(Pj BQj,  PjQj).  More  precisely  we  will  show  that  the  no-breakdown  condition 

(3.2)  min  |w,-|  >  10je, 

i<j 

and  the  semi-duality  condition 


(3.3) 


maxdKp'+J’g.in, 

t<J~l 


r1/2iun  m-ll2p;cM<v, 


suffice  to  ensure  the  preservation  of  ( TjMj )  described  in  the  previous  sentence  (see 
Theorem  5.13).  It  is  necessary  to  strengthen  the  condition  (3.3)  somewhat  to  guar¬ 
antee  the  viability  of  two-sided  Gram-Schmidt  (GS)  when  (3.2)  is  nearly  an  equality. 
Note  that  in  the  symmetric  case  f Ij  =  I.}  and  (3.3)  reduces  to  (3.1)  as  claimed. 

The  superscript  '  in  p(+1  and  q(+1  indicates  that  these  are  the  candidate  Lanczos 
vectors  computed  by  the  three  term  recurrence,  but  not  necessarily  the  actual  i-f  1th 
Lanczos  vectors. 
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3.1  Analysis  of  GS 

We  are  going  to  derive  a  matrix,  M,  whose  norm  is  the  “right”  factor  by  which  duality 
is  enhanced  in  GS.  Recall  from  §2.1  that  at  the  end  of  step  j  the  Lanczos  algorithm 
has  computed  dual  matrices  of  Lanczos  vectors  P*  and  Qj,  candidate  Lanczos  vec¬ 
tors  (p'  + J*  and  q'j+1,  and  u)]+i  denotes  the  computed  value  of  the  inner  product 
(Pj+i)*Qj+v  We  assume  u3+\  /  0.  Due  to  the  loss  of  duality,  P* Qj  Qj  and  off 
diagonal  entries  of  P*Q3  could  be  as  large  as  1  if  the  three  term  recurrence  is  not 
modified. 

Suppose  that  \\P*q'j+l\\2  and  Wip'j+iYQjh  are  too  big  (criterion  to  be  discussed 
later).  GS  yields  new  candidates  (pj+i)*  and  q3+\  satisfying 

(*«)•  =  (?;+.)•(/, 

and 

sj+i  =  (h  -  QjVjlp;)qU  i- 

Now  we  examine  the  new  duality  situation. 

(Pi+\YQi  =  (.p'wY(is  -  Q^'p;)Q,  = 

(p’hiYQYi,  -  aj'PjQi), 

and 

=  p;(h  -  QjVj'PjWj+i  = 

(h  -  p;Q^j')p;qUr 

The  factor  D”1  in  the  middle  is  alarming  because  we  expect  some  Ui  to  become  quite 
small  and  we  fear  that  off  diagonal  entries  may  rise  too  close  to  1.  This  feature  is 
absent  in  the  symmetric  case.  However  the  situation  is  better  than  it  appears. 

We  can  obtain  a  more  balanced  expression  for  the  duality  of  the  new  vectors 
(Pj+i)*  and  qj+ 1  by  writing 

Q3  =  |  Qj  1 1/2sign(flj )  |  Dj  | 1/2 . 

Then  modified  expressions  for  the  duality,  namely 

(p,+1YQj\ fy1/2  =  (p',+iYQ,(i,  -  n-'p-QYm-'P  = 


(3,4)  (p'+i)*ei|n,r,/2sign(n-)  (sign(fl,)  - 

and 

\n,\-'pp;q,^  = 
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(3.5)  (sign(Sy  -  \n,\-'/2P;q'j+1 

show  that  the  balanced  “reducing  factor”  after  applying  GS  is  \\Mj\\  where 

(3.6)  Mj  =  sign(Gj)  -  \nj\~ll2P;Qj\nj\~112 . 

We  get  no  benefit  from  the  cost  of  GS  unless  ||M|j  is  much  less  than  one. 

We  choose  to  measure  duality  using 

m%r1/2%+iiii  and  \\p*j+iQj\nj\~l/2\\oo, 

and  define  the  effectiveness  of  GS  using  the  balanced  connection  matrix  Mj.  Note 
that  ||x||i  =  HrrlU. 

To  illustrate  the  advantage  of  a  balanced  connection  matrix,  we  applied  LanFRB 
to  the  matrix,  B,  that  arises  from  the  finite  difference  discretization  (five  point  stencil) 
of  the  partial  differential  operator 

T[u](x)  =  —A  u  +  50V-(rix)  —  12bu 

on  a  regular  31— by— 31  grid  over  the  unit  square  with  zero  boundary  values  [34]. 
Though  the  eigenvalue  problem  for  B  is  ill  posed  (because  the  coefficients  50  and  125 
are  enormous  compared  to  the  grid  size)  this  example  is  relevant  because  breakdown 
occurs  at  step  56,  and  at  the  previous  step 

k’5o|  ~  le  -  13  and  | j Qrt1  Tbs || i  ~  ll||£||i- 

Figures  1  and  2  display  the  absolute  values  of  the  entries  of  the  unbalanced  connection 
matrix,  /55-Q^1  P?-Q55,  and  the  balanced  connection  matrix,  M55,  on  a  semi-log  scale. 

Recall  that  (pj+i)*  and  qj+ 1  are  obtained  from  (p'  + })*  and  q']+l  by  GS.  If 

(3-7)  Wi+iyQM-^  < 

then,  bv  (3.4). 

ll(ft+i)’Qjlnjl"1/2IL  <  e. 

If 

(3-g)  iiwW^+iiii  <  Wr- 

lb  will 

then,  bv  (3.5). 

ui%ri/2p,-9rtlii,  <  t. 
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The  sequence 

(3.9)  (max(||(p'+1)-(3i|nj|->/2|U,|||njr1/V;,'+1||1)))>1 

tends  to  increase  with  j  gradually  until  the  last  term  is  too  big.  At  that  step  a 
correction  step  is  made:  (q'j+1  — ►  qJ+ x).  This  change  reduces  the  latest  term  in  (3.9) 
to  e.  Hence  semi-log  graphs  of  (3.9)  look  sawtoothed. 

We  use  this  perspective  on  GS  to  find  the  “right”  definition  of  semi-duality.  For 
overall  efficiency  we  want  to  minimize  the  total  number  of  corrections  and  particularly 
avoid  unnecessary  corrections  near  the  end  of  a  Lanczos  run.  So  we  seek  the  weakest 
conditions  that  give  adequate  levels  of  duality.  To  this  end  we  explicitly  ensure  that 

(3.10)  max(||(P'+I)-<?J|S!)r1/2|U|A/,ll»,i|Mj||1||%r1^;9'+1||I)  <  e. 


Condition  (3.10)  takes  no  account  of  and  if  \uj+1  |  is  too  small  it  is  essential  not 
to  accept  (p)+1)*  and  <z)+1-  So,  in  addition  to  (3.10)  we  must  take  account  of  possible 
growth  in  ||Afj+i||].  To  analyze  this  note  that  the  nonzero  part  of  the  rightmost 
column  of  Mj+i  is 

n  _1/2p*/  i— 1/2 

W  T)Wj+ i  r  j+i  I 


11^7  2p/^+ik+il_1/2lli  <  ll^i+illi. 

a  necessary  condition  for  (3.10)  to  hold  at  step  j  +  2  without  correction  is  that 


u^+.iii  ii  i%+1|-,/2p,^i9'+2ii1  <«. 

The  square  root  of  (3.11)  yields 

(II  |a,|-I/!P>-9'+1||1||  |nJ«r,/2py19'«||,)1/2  < 

and  this  is  guaranteed  by 

(3.12)  max(||  l%|-1'2.Fy9'+1||!ji  |-1/2Pyl9'+2||,)  <  «I/2k+1|1'2. 

The  similar  argument  applied  to  (p]+1)'  yields  our  definition  of  semi-duality. 
Definition  3.13  Semi-duality  holds  at  step  j  -f  1  if  for  i  <  j , 

maxUijdiGiia-r^iu  ii  |fi.r1/2p-?;+1iii)  <  a'2  k+1i,/4. 
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4  The  Lanczos  Algorithm  With  Correction 

In  this  section,  we  present  a  practical  and  efficient  implementation  of  the  Lanczos 
algorithm  with  correction  (LanCor  hereafter).  Several  implementation  details  are  ad¬ 
dressed  and  relevant  properties  of  the  computed  quantities  are  established.  Extensive 
work  from  [7]  on  how  to  implement  the  Lanczos  recurrences  is  summarized  in  §4.1. 
In  §4.2  wre  discuss  how  to  correct  the  duality  loss  and  what  effect  this  has  on  the 
computed  quantities.  An  efficient  implementation  of  correction  steps  called  retroac¬ 
tive  correction  is  discussed  and  justified  in  §4.3.  In  §4.4  we  will  show  how  to  compute 
(p'j+iYQj  and  PjQj+i  without  accessing  P*  and  Qj  and  using  only  O(j)  floating  point 
operations  and  storage  per  step.  The  Lanczos  algorithm  with  correction  is  given  in 
§4.5. 


4.1  Implementing  The  Three  Term  Recurrences 

A  prerequisite  to  the  analyses  of  later  sections  is  an  understanding  of  how  nearly  dual 
consecutive  pairs  of  left  and  right  Lanczos  vectors  can  be.  We  say  that  local  duality 
holds  at  step  j  if, 


(4.1) 


max  (I  p*qi-i\,\P*i-iqi\)  <  4e- 

1<2<? 


In  [7]  it  is  proved  that  local  duality  is  maintained  to  within  a  (theoretically  necessary 
but  generally  unrealistic)  factor  of  n  by  the  implementation  of  the  three  term  recur¬ 
rences  below  called  LanLD.  LanLD  stands  for  the  Lanczos  algorithm  maintaining 
local  duality. 

LanLD  Algorithm 

Start:  pl  =  p/|b||2Wi  =  =  1, Pi  =  7i  =  0,uq  =  obl¬ 


iterate:  For  i=l,MaxStep 

1  •  r*=  p*B  -  ^pU ,  =  Bqi  -  i  ^ 

2.  e  {p*st,r*qt} 

3-  r*  :=  r*  -  ^p*,  *  :=  *  -  fcg 

4.  Qi[  =  r*qi,  a\  =  p*Si  /*  these  are  small  corrections  to  cq  */ 


5.  r*  :=  r* 


Si  ■—  Si  q 


6.  =  1 1 w* 1 1 2 ,  7i+i  =  |N|2 

7.  Check  for  invariant  subspace.  See  equations  (4.3)  and  (4.4). 

8.  p*+i  =  r*/pi+ 1,  ql+ 1  =  Sj/7t+i 

9.  u>i+\  =  p*i+1qi+i 
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10.  Check  for  breakdown:  |u;i+1|  <  10(i  +  l)e 

11.  Check  for  convergence  periodically  (  see  §2.2  ) 

Remark  1.  The  meaning  of  step  2  is  that  at  can  be  assigned  either  of  the  values 
p*Si  or  r*qj,  it  does  not  matter  which. 

Remark  2.  Local  duality  is  maintained  by  steps  4  and  5. 

The  properties  of  the  quantities  computed  by  LanLD  are  summarized  as  follows. 
We  assume  that  the  no  breakdown  condition  holds, 


(4.2)  min.  |wt|  >  lOje, 

1<2<J 

that  the  no  invariant  subspace  conditions  hold:  for  1  <  i  <  j, 

(4.3)  Pi+i  >  max(\/e(<J)i  +  1  +  ^)||R||2> 
and 

(4.4)  7«+i  >  max(V€($i  +  1  +  ^)\\B\\2,  KMD- 


Here  is  the  growth  factor  defined  (2.8),  and  the  constant  ib  accounts  for  the 
rounding  error  introduced  when  the  operator  B  is  applied. 

In  this  section,  we  add  two  more  error  bounds  to  our  model  of  the  computed 
quantities  which  are  proved  to  be  realistic  in  [7].  First,  the  Lanczos  vectors  satisfy 
the  perturbed  three-term  recurrences 


(4.5) 

and 

(4.6) 


Pj+i  Pj+i  =P*jB- 


p-  -  ^-pU  ~  lj 

UJj  J  U)j- 1  J  J 


Qj+ilj+\  ~  Bqj  Qj 


a  j  +  ctTj 

w,- 


8j  ujj 

ffi-irH'  -to, 


j)  W  - 1 

where  the  matrices  Fj  =  [/i, ...,  fj]  and  Gj  =  [^1?  ...,gj]  are  such  that 


(4.7)  max(||F:;||2,||Gj||2)<e(^  +  l)||R||2. 

Second  the  tiny  refinements  to  the  trailing  bits  of  each  a,  to  maintain  local  duality, 

(4.8)  Dlj  =  diag(or{)4=1  and  D]  =  diag(a-)i=i 
satisfy 

(4.9)  max(||D'||2,  ||Z);||2)  <  e(4,  +  l)||fl||2. 
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4.2  Properties  Of  The  Computed  Quantities 

Recall  from  §3  that  the  loss  of  duality  of  the  candidate  Lanczos  vectors  to  the  previous 
Lanczos  vectors  is  corrected  using  a  version  of  the  Gram-Schmidt  process.  Our  model 
of  the  properties  of  the  quantities  computed  by  LanCor  is  obtained  by  amending  the 
model  for  LanLD  to  account  for  correction  steps. 

To  correct  the  loss  of  duality  of  the  ( i  +  l)st  Lanczos  vectors  to  the  previous 
Lanczos  vectors  at  the  end  of  a  Lanczos  step  we  first  compute 

(4.10)  x*  =  Vi  =  PUqM- 

Next  we  “re-bi-orthogonalize”  or  correct  the  candidate  Lanczos  vectors; 

(4.H)  (*+>)•  =  ((?;«)•  -  1. 

and 

(4.12)  Qj+l  =  Qj+l  ~ 


Let  Xj  denote  the  set  of  all  indices  i  up  to  and  including  j  at  which  correction 
steps  are  taken,  let  e*  denote  the  zth  column  of  the  j  x  j  identity  matrix,  and  let 

(4.13)  A,  =  D‘  +  Y.  ft+1e,(«),,0),  Tj  =  D]  +  £  (  ef7i+,. 

ieij  ieij  V  u  / 


Recall  that  and  Dr3  are  defined  in  equation  (4.8). 

For  the  purpose  of  illustration  T40  +  T40  corresponding  to  a  model  problem  dis¬ 
cussed  in  [24]  is  displayed  on  a  semi-log  scale  in  figure  3. 

The  governing  equations  for  LanCor  are 

(4.14)  p;b  =  (Tj  +  a  ,)n;'p;  +  e;ft+lP*+1  +  f;, 

and 

(4.15)  BQj  =  +  Tj)  +  ?j+17J+1eJ  -I-  Gj. 

The  matrices  Tj  (for  upper)  and  Aj  (for  lower)  are  upper  and  lower  triangular  matri¬ 
ces  of  spikes,  one  spike  for  each  correction  step.  Local  duality,  (4.1),  equation  (4.16), 
and 

(4.16)  max(||Ajfl~1/2||2,  ||^71/2Tj||2)  <  +  1)||5||2. 

are  also  realistic  for  LanCor  [7]. 
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The  matrix  of  spikes  T  +  Upsilon 


40 


Row  Index 


Figure  3:  T40  +  T40 

4.3  Retroactive  Correction 

In  this  section  we  show  how  to  implement  a  correction  step.  As  in  the  symmetric 
case,  correction  steps  are  taken  in  pairs  so  that,  the  duality  of  the  current  Lanczos 
vectors  deteriorates  slowly.  As  we  purge  the  (j  +  l)sf  candidate  Lanczos  vectors 
of  their  components  along  the  previous  Lanczos  vectors,  we  retroactively  purge  the 
jth  Lanczos  vectors  of  their  components  along  the  first  j  —  1  Lanczos  vectors.  We 
implement  the  correction  step  so  that  each  Lanczos  vector  is  transferred  from  slow 
storage  to  fast  storage  and  back  again  only  once.  Following  [23]  we  call  this  retroactive 
correction.  We  apply  the  two-sided  modified  Gram-Schmidt  algorithm  to  both  the 
last  two  pairs  of  Lanczos  vectors.  Retroactive  correction  is  implemented  as  follows. 

Retroactive  Correction  Algorithm 

Iterate:  For  i  =  1  ...j  —  1, 

1.  pj+ 1  :=pj+ 1  -  Pi(uj-*(q*pj+ 1)) 

2.  Pj  :=pj  -  pl{uj'*(q*pj)) 

3.  qj+ 1  :=  qj+ i  -  ql(u,\l(p*qJ+i)) 

4.  qj  :=  q3  -  ql(u~1  (p*q3)) 

Recover  local  duality 

1.  pJ+ 1  :=pj+ 1  -pj{uj*(q*pj+ j)) 

2.  qj+1  :=  qj+1  -  qj(u~1{p*qJ+ j)) 
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It  is  important  to  not  normalize  pj  and  qj  after  a  retroactive  correction  step.  Retroac¬ 
tive  correction  changes  the  relations  among  the  computed  quantities.  Nonethe¬ 
less,  a  careful  analysis  shows  that  as  long  as  semi-duality  is  maintained  the  prop¬ 
erties  (4.1),  (4.16),  (4.9),  and  (4.16)  are  also  realistic  for  LanCor  with  retroactive 
correction  [7]. 


4.4  Monitoring  The  Loss  Of  Duality 


_ -I  fry  _ n  fry 

We  must  correct  for  the  loss  of  duality  when  either  Pj+iQj\&j  \  or  \P* qj+\ 

increases  to  ■v/e|u;]+41|.  To  compute  these  vectors  at  each  step  is  about  as  costly  as 
correcting  the  loss  of  duality  at  each  step.  The  same  problem  arises  in  the  symmetric 
case.  Compromise  symmetric  Lanczos  algorithms  avoid  this  costly  step  by  updating 
a  recurrence  estimating  the  loss  of  orthogonality  at  each  step  [11,  18,  19,  23,  27,  28]. 
In  this  section  we  extend  the  partial  reorthogonalization  algorithm  (PRO)  from  the 
symmetric  Lanczos  algorithm  [27,  28]. 

The  sequence  of  vectors  (p*+iQ»)  and  (. P*qi+i )  satisfy  a  three-term  recurrence 


which  we  now  derive.  Let  Uij  =  p*qj .  In  this  notation,  c u*  —  a;*,*. 

Suppose  that  a  correction  step  is  not  taken  at  step  j  (i.e.  in  computing  the  (j  +  l)st 
Lanczos  vectors).  Multiplying  equation  (4.5)  by  Qj  and  substituting  p*  (4.15),  we 


have 


0i+iPj+iQi  =  PiQi  nj'Fi +  Ti)  - 


Oii 


O', 


LOj 


(4.17)  +  ^o+i7J+1e*  +  p*Gj  -  f*Qj. 

CJj-l 

Similarly  multiplying  (4.6)  by  P*  on  the  left  and  substituting  in  (4.14)  q3,  we  have 


Pj  9j+i7j+i  —  ( (Pj  +  ^j)^j 


a * 


a  U 


UJi 


1  pJ<h 


(4.18)  -  qj- 1  +  ejPj+jLUj+ij  +  F* qj  -  P* gj. 

U>j-l 

To  further  reduce  equations  (4.17)  and  (4.18),  we  first  need  to  discuss  some  ad¬ 
ditional  relations  among  the  computed  quantities.  First  we  show  that  the  correction 
terms 

p'Qjtt-'Tj  and  AjQj'PJqj 

negligibly  effect  the  loss  of  duality  among  the  computed  Lanczos  vectors.  For  this 
reason,  these  matrices  are  not  used  to  estimate  the  loss  of  duality,  and  are  not  stored. 
Since  j  is  not  a  correction  step  equation  (4.13)  implies  that 

AjCtj  Pj q-j 6j  =  Dj e j  —  OjCXj. 
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Substitute  the  definition  of  semi-duality,  (3.3),  and  equation  (4.16)  to  find 


I|Aj«7'  (  V  )  1,2  -  1 1  1/2 1 1 2 1 1 Q  “if  iy_  1<?UI=  <  +  l)e||B||2- 

The  analysis  of  p*QjQ~lTj  is  similar. 

Next  we  expand  terms  such  as  P*  q3 : 

(4.19)  p*Qj  =  (p*Qj-U0)  +  uJje*j,  and  P-qj  =  ^  j  +ujjey 

By  equation  (4.19)  and  the  definition  of  T),  we  have 

rr\-irr  ai  +  aj  T\  _ 

PjQjiytj  Tj  I)  — 

UJ j 

(Q  .  \ 

- ~  a‘jeP 

t  Ocj  H-  oiL 

(W  - -pr^I)p’q‘ 

\JU  j 

(4.21)  =  (t^J1  -  (  Pj-M  j  +  -  e,a'. 

Since  step  j  is  not  a  correction  step  the  last  row  of  T  j  and  the  last  column  of  A  j  are 
zero.  That  is  why  they  do  not  appear. 

We  also  need  the  identities 

(4.22)  P*j-\Qj  =  {P*j-iQj-2-.  0)  +  u)j- ie*_j  +  u>j- \je*, 

and 

I  -  \ 

(4.23)  PjQj- 1=  0  +  Wj-iej-i  +  Ujj-iCj- 

V  0  / 

Finally  substitute  equations  (4.22)  and  (4.20)  into  equation  (4.17)  and  equa¬ 
tions  (4.23)  and  (4.21)  into  equation  (4.18)  and  the  desired  recurrences  appear 

P»iPi+iQi  =  (PjQj- iMnj'Tj  -  ^/) 

LOj 


(4.20) 

and 
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(4.24) 

and 


(4.25) 


0, 0)  +  Uj-ijetj)+ 

Uj- 1 

(wj,j+i7j+i  +  OLj  -  «j)e*  +  0(e($j  +  1)||B||2) 

p;?)+l7j+1  =  (TjQj1  -  %i)  (  7-1®  ) 


( 

\ 

\ 

0 

v\ 

0 

) 

J 

^'(/^i+i^i+ij  +  alj  -  oij)  +  0(e($j  +  1)||-B||2) 


4.4.1  An  Implementation  of  the  Monitoring  Algorithm 

LanCor  is  similar  to  LanLD,  but  with  additional  work  done  (if  necessary)  between 
LanLD  iterations  to  maintain  semiduality.  The  perturbed  recurrences  (4.24),  (4.25) 
are  invoked  to  compute  (53+\p*+lQ3  and  P*q3+\^3+i  after  (3j+ 1  and  7j+i  are  computed 
as  in  step  6  of  LanLD.  The  decision  whether  or  not  to  correct  duality  loss  is  then  made 
as  determined  in  §3.1.  In  this  section  we  show  how  to  implement  the  recurrences  to 
obtain  accurate  estimates  of  the  duality  loss. 

In  LanCor  extended  local  duality  is  maintained.  At  each  step,  j ,  we  explicitly 
‘dualize’  the  candidate  j  +  1th  Lanczos  vectors  the  j  -  1th  Lanczos  vectors  and  then 
the  jth  Lanczos  vectors  (local  duality). 

P*q3+i  is  estimated  by  hj+ 1.  Initially  h2  and  h3  are  exact,  and  for  j  >  2, 


(4.26) 


fyi+iTj+i  —  (TjClj  -0 


Dj 


r  hd  1 

1 

d3" 

i 

_ I 

ILj 

— 

0 

0 

- 1 

'  o 

Pj wi  p  Jj) 
- e3-2<y,  . 

Wj-i  J 


To  account  for  the  perturbation  of  the  three  term  recurrences  by  correction  steps, 
ediag|Dj17’?|  is  added  to  the  right  hand  side  above  if  the  loss  of  the  duality  of  q3-i 
and  q3  was  corrected.  The  j  —  1  and  j  entries  of  the  estimate,  hJ+1,  are  assigned  the 
exact  values  p*j-1qj+ 1,  and  p)q3+i-  Maintaining  extended  local  duality  sweeps  the  ap 
term  from  the  jth  entry  to  the  j  —  2th  entry,  hence  the  ej-2^3  ^  above.  The  estimate 
of  p*j+iQj  is  computed  by  the  similar  recurrence. 
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4.5  The  Implementation  Of  LanCor 

In  this  section  we  summarize  the  implementation  of  LanCor,  the  Lanczos  algorithm 
maintaining  semi-duality. 


LanCor  Algorithm 

Start:  Pl  =p/\\p\\2,qi  =  q/ ||?||2,w0  =  h  A  =  7i  =  0,^  =  p\qi- 
Iterate:  For  i=l,MaxStep 


!■  r*  =  PiB  -  ^Pi- 1.  «*  =  Bcp  -  ql_] 


W*-l 


2.  Qj  -  r*ql 


3.  r*  -  r* 


5 


ft  “  ft” 


4.  Maintain  extended  local  duality  (see  §4.4.1) 
b.  a\  =  r*qh  a[  =  p*Si 


6.  r* 


'  lo1 


7-  A+1  =  Ikilb,  7»+i  =  INh 

8.  Check  for  invariant  subspace.  See  equations  (4.3)  and  (4.4). 

9-  Pi+ 1  ^i/Pi+lt  q%+ 1  =  Si/li- t-1 

10.  Wi+x  =  p*+1ft+i 

11.  Check  for  breakdown:  |u>j+i|  <  lOze 

12.  Monitor  duality  loss  (  see  §4.4.1  ) 

13.  Correct  duality  loss  only  if  necessary  (  see  §4.3  ) 

14.  Check  for  convergence  after  a  correction  step  only  (  see  §2.2  ) 


5  Preserved  Quantities 

The  Lanczos  algorithm  with  correction  (LanCor)  applied  to  an  operator  B  after  j  suc¬ 
cessful  steps  yields  matrices  P*  and  Q3  of  Lanczos  vectors  and  the  reduced  tridiagonal- 
diagonal  pencil  (‘ Tj,Qj ).  In  this  section  we  compare  the  computed  quantities  to  the 
corresponding  exact  quantities  determined  by  B,  the  row  span  of  P*  and  the  column 
span  of  Qj.  We  say  that  a  computed  quantity  is  preserved  when  it  is  as  close  to  the 
corresponding  exact  quantity  as  the  data  warrants. 

Our  main  result  is  that  semi-duality  suffices  to  preserve  (Tj,Qj).  See  §5.3. 

The  analysis  is  more  complicated  than  in  the  symmetric  case.  We  must  avoid 
perturbations  that  are  proportional  to  1 1 T2 ~ 1 1 1 2  =  1/ min^^  |u>j|. 
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5.1  Exact  Projections 

The  operator 

(5.1)  n^QjiPjq^p; 

is  the  oblique  projection  corresponding  the  computed  Lanczos  vectors.  The  pro¬ 
jection  of  B  onto  the  spaces  spanned  by  the  Lanczos  vectors  is  the  matrix  YljBYlj. 
See  Definition  2.1.  The  Lanczos  vectors  are  dual  if  and  only  if  PjQj  is  diagonal  and 
nonsingular. 

We  recover  exactly  dual  bases  corresponding  to  the  computed  Lanczos  vectors  by 
use  of  the  LDU  factorization  of  PjQj ; 

(5.2)  P*Qj  =  LjQjUj. 

Recall  that  Qj  =diag {P*Qj).  If  Qj  is  nonsingular  then  substitute  (5.2)  into  (5.1)  to 
obtain 

(5.3)  n j  =  QjU~1  Cl-1  LJ'P*. 

The  rows  of 

(5.4)  p;  =  Lfp; 
and  the  columns  of 

(5.5)  Qj  —  QjUJ1 
are  dual  since 

P-Q,  =  Lj'P-QiV-'  =  SV 

Two-sided  Gram-Schmidt  applied  to  P*  and  Qj  yields  P*  and  Qj.  Next  define  Tj  by 

(5.6)  Tj  =  P*BQ3. 

Note  that  f3  is  not  tridiagonal.  The  representation  of  XljBXlj  with  respect  to  the 
bases  Clj1P*  and  Qj  is  Clj 1 T) .  This  matrix  is  equivalent  to  the  pencil  (Tj,Clj).  This 
pencil  is  analogous  to  the  orthogonal  projection  of  the  operator  onto  the  span  of  the 
computed  Lanczos  vectors  in  the  symmetric  case. 

5.2  Conditions  for  Preservation 

Each  computed  Lanczos  vector  is  the  sum  of  the  vector  which  exactly  satisfies  a 
three-term  recurrence  and  another  vector  whose  norm  is  proportional  to  the  machine 
epsilon,  e.  See  equations  4.5  and  4.6.  This  result  is  typical  of  Krylov  subspace  methods 
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[19].  This  perturbation  of  the  Lanczos  vectors  causes  perturbations  of  the  diagonal 
elements  of  Q  by  approximately  e,  and  perturbations  of  the  tridiagonal  elements  of  T 
by  approximately  e||I?||.  We  show  that  exactly  correcting  the  loss  of  duality  among 
the  computed  Lanczos  vectors  does  not  change  the  pencil  (T,  Cl)  by  significantly  more 
than  these  amounts.  We  call  this  property  the  preservation  of  T  and  Cl.  The  elements 
of  T  and  Cl  are  not  in  general  determined  to  working  (or  full  relative)  precision.  This 
implies  that  TO-1  and  Q_1T  are  not  determined  to  full  absolute  precision. 

This  section  addresses  the  problem  of  determining  necessary  and  sufficient  condi¬ 
tions  for  three  properties  of  the  computed  quantities  to  hold.  The  three  properties 
are  (1)  that  W  =  P*Q  admits  an  LDU  factorization,  (2)  that  the  diagonal  matrix 
D  is  approximately  Cl,  and  (3)  that  L  and  U  are  well  conditioned.  To  be  precise,  we 
determine  realistic  sufficient  conditions  for  any  complex  nxn  matrix  W  with  nonzero 
diagonal  elements  to  admit  an  LDU  factorization 

(5.7)  W  =  LDU 


such  that 

(5.8) 

and  such  that 

(5.9) 


||diag(W)  -  D ||2  <  2e, 


max(||L  1||2,  || U  1||2)  <  2. 


In  our  case  W  =  P*Q.  By  equation  (5.2)  there  holds  D  =  Cl  and  (5.8)  immediately 
implies  the  preservation  of  Cl.  The  preservation  of  T  is  discussed  in  §5.3. 

Our  results  are  given  in  the  two  theorems  below.  Theorem  5.10  gives  necessary 
and  sufficient  conditions  for  conditions  (5.7)  and  (5.8)  to  hold.  Theorem  5.13  gives 
sufficient  conditions  for  all  three  conditions  to  hold  which  are  only  slightly  stronger 
than  the  hypotheses  of  Theorem  5.10  (i.e.  the  lower  bound  on  \(Wj\  increases  from  2 je 
to  10  je). 

For  any  matrix  C  let  triu'(C )  denote  the  strictly  upper  triangular  part  of  C. 

Theorem  5.10  Let  W  be  an  j  x  j  complex  matrix,  let  Cl  =  diag(TT)  =  diag(u>j). 
Suppose  that  e  >  0,  j  >  2,  (j  —  2)e  <  1,  and  that  W  satisfies  the  following  hypotheses; 

(5.11)  min  |o;j|  >  2 (j  —  2)e, 

1  <i<j 


(5.12)  max(||fi  l/2triu\W)\\i,  ||Q  1^2triu'(W*)\\i)  <  y/e. 

Then  equations  5.7  and  5.8  hold. 
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Proof.  [7] 

Remark.  The  second  hypothesis,  (5.12)  is  equivalent  to  the  semi-duality  condi¬ 
tion  (3.3). 

One  approach  to  proving  Theorem  5.10  is  to  apply  the  perturbation  theory  for 
Gaussian  elimination.  Many  papers  have  appeared  on  this  subject  recently  [2,  32, 
33].  To  guarantee  condition  (5.8),  all  of  these  general  perturbation  bounds  require 
significantly  stronger  hypotheses  than  Theorem  5.10. 

Theorem  5.10  gives  necessary  and  sufficient  conditions  for  the  preservation  of  Cl 
(i.e.  the  condition  (5.8)).  By  increasing  the  lower  bound  on  \uifi  by  a  factor  of  5,  we 
will  show  that  the  computed  quantities  are  preserved.  Due  to  the  difficulty  of  this 
task,  we  must  be  satisfied  with  un-achievable  but  realistic  bounds. 

Theorem  5.13  Let  W  be  an  j  x  j  complex  matrix,  let  Cl  =  diag(W)  =  diag(cu,). 
Suppose  that  e  >  0,  j  >  2,  ( j  —  2)e  <  1,  and  0  is  nonsingular.  Suppose  in  addition 
that  W  satisfies  the  hypothesis  (5.12),  and  that 

(5.14)  min  jwjl  >  lOje. 


Then  equations  5.7  to  5.9  hold. 

Proof.  [7] 

The  idea  of  the  proofs  is  to  decompose  W  into  the  sequence  of  extensions 


(5.15) 


Wi+1  = 


Wi  Vi  \ 

A  ui+ 1  ) 


We  define  the  sequence  1  corresponding  to  a  norm  ||.||  by 

(5.16)  max(||xj||,  IlytH)  =  Ki. 

As  in  [16],  we  then  extract  the  worst  case  information  corresponding  to  {/q}(=1. 


5.3  Preservation  of  T 

The  pencil  computed  by  the  three  term  recurrences  can  eventually  become  of  larger 
order  than  the  original  matrix  and  clearly  differs  from  the  one  that  would  be  produced 
in  exact  arithmetic,  but  this  can  never  happen  if  semi-duality  holds.  In  this  section, 
we  show  that  if  semi-duality  is  maintained  then  the  computed  pencil,  ( Tj,Clj )  agrees 
with  the  exact  oblique  projection  of  the  spans  of  the  computed  left  and  right  Lanczos 
vectors  to  full  absolute  precision. 

Correcting  the  loss  of  duality  of  the  (j  +  l)st  Lanczos  vectors  to  the  previous 
Lanczos  vectors  replaces  the  vectors  computed  by  the  three  term  recursion  with  (ap¬ 
proximations  of)  pj+i  and  qj+i,  where  pj+\  denotes  column  j  +  1  of  P*.  and  qj+\ 
denotes  column  j  +  1  of  Qk-  The  corresponding  elements  elements  of  T  and  Cl  change 
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to  (approximations  of)  the  corresponding  elements  of  the  dense  matrix  T  and  the  di¬ 
agonal  matrix  Cl.  We  want  to  know  how  large  this  perturbation  is,  and  in  particular 
when  it  is  negligible. 

The  following  theorem  established  the  preservation  of  T  for  LanCor.  Recall  that 
Theorem  5.10  establishes  the  preservation  of  0.  The  proof  uses  the  properties  of 
the  computed  quantities  established  in  §4  and  this  section.  The  hypotheses  of  The¬ 
orem  5.13  are  the  no-breakdown  and  no-invariance  conditions  from  §4.1,  and  the 
semi-duality  condition;  for  1  <  i  <  j, 

(5-17)  max(|||Qj|_1/2Pi*5i+1||1,  |||f2ir1/2Q*pi+1||1)  <  y/t. 


Theorem  5.18  Let  B  be  a  complex  n  x  n  matrix,  let  P*  and  Qj  be  the  matrices 
of  Lanczos  vectors  computed  by  the  Lanczos  algorithm  with  correction.  Let  Tj  de¬ 
note  the  computed  tridiagonal,  and  let  Tj  be  defined  as  in  equation  (5.6).  If  Clj  = 
diag (P*Qj)  satisfies  the  no-breakdown  condition,  (4-2),  the  no-invariant  subspace  con¬ 
ditions,  (4-3)  and  (4-4),  an<d  semiduality  holds  (see  5.17  ),  then  for  defined  in 
equation  (2.8),  there  holds 

\\Tj  -^112  =  0(7(^  +  1)6115112). 

Proof.  [7] 

6  Numerical  Experiments 

The  Lanczos  algorithm  maintaining  local  duality  only  (LanLD),  semi-duality  (Lan¬ 
Cor),  and  full  duality  (LanFRB)  have  been  applied  to  many  tasks.  We  present  the 
results  for  one  representative  example  here.  All  computations  were  done  in  MATLAB 
on  an  IBM  Power  Workstation  with  machine  precision  e  ~  2  10-16  =  2e  —  16. 

6.1  The  Tolosa  Matrix 

We  illustrate  the  properties  of  the  Lanczos  algorithm  with  correction  (LanCor)  using 
the  Tolosa  matrix,  A,  of  order  n  =  2000  from  the  Harwell  Boeing  Sparse  Matrix 
Collection.  The  computational  task  is  to  compute  the  largest  eigenvalues  of  A  to  half 
precision.  We  choose  to  compute  the  50  largest  eigenvalues  because  this  emphasizes 
the  difference  between  LanLD  and  LanCor.  A  has  5184  nonzero  entries  and  ||A||i  « 
le  +  6.8.  Since  A  averages  less  than  3  nonzeros  per  row,  the  inner  products  in  the 
three  term  recurrences  cost  nearly  as  much  as  the  matrix  vector  multiplications,  in 
terms  of  floating  point  operations. 

The  eigenvalue  problem  for  A  is  known  to  be  ill  conditioned  and  A  is  known  to 
possess  multiple  eigenvalues  [25].  For  theoretical  purposes,  we  computed  the  eigen¬ 
values  of  A  by  the  QR  algorithm,  and  observed  that  the  spectral  radius  of  A  is 
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approximately  le  -f  3.4.  For  this  reason,  the  MATLAB  function  balanceQ  was  ap¬ 
plied  to  A  to  obtain  a  balanced  matrix,  B ,  diagonally  similar  to  A.  For  this  B  there 
holds  || B || x  «  HSU*,  «  le  +  4.0.  Since  ||i?||2  <  ||B||i||B||oo?  balancing  yields  a  matrix 
whose  Euclidean  norm  is  within  a  factor  of  4  of  its  spectral  radius.  QR  applied  to  B 
computes  three  real  eigenvalues,  —12.098,  —24.196,  and  —36.294,  of  multiplicity  382; 
the  remaining  eigenvalues  are  distinct  and  well  separated.  Though  the  eigenvalues  of 
A  and  B  are  the  same  (barring  underflow),  the  computed  eigenvalues  of  A  and  B  by 
QR  (without  balancing)  agree  to  from  full  to  half  relative  precision.  We  will  compare 
the  eigenvalues  of  B  computed  by  the  QR  algorithm  to  the  three  implementations  of 
the  Lanczos  algorithm,  full  re-bi-orthogonalization  (LanFRB),  LanCor,  and  Lanczos 
with  only  local  duality  (LanLD),  and  Arnoldi’s  method. 

We  did  not  compare  the  Lanczos  algorithm  to  the  Implicitly  Restarted  Arnoldi 
Iteration  [29,  14].  Arnoldi’s  method  does  establish  a  lower  bound  for  the  number  of 
steps  required  by  Implicitly  Restarted  Arnoldi  Iteration.  Implicit  Restarts  can  be 
incorporated  in  the  Lanczos  algorithm  as  well  as  Arnoldi’s  method  [12]. 

6.2  Results 

All  three  implementations  of  the  Lanczos  algorithm  computed  the  requested  50  eigen¬ 
values  to  the  same  high  accuracy.  For  Arnoldi’s  method,  LanFRB,  and  LanLD  the 
Ritz  values  are  checked  every  50  iterations.  In  each  case  pi  =  qi  is  the  same  ran¬ 
dom  vector  (normal  distribution).  LanFRB  and  LanCor  have  identical  convergence 
properties;  this  is  a  consequence  of  the  preservation  of  the  pencil  (T,  ft)  (see  §5). 
The  reward  for  maintaining  semi-duality  is  that  fewer  Lanczos  steps  are  required  to 
complete  the  given  task.  In  this  case,  LanFRB  and  LanCor  required  400  steps  and 
363  Lanczos  steps,  respectively,  while  LanLD  and  Arnoldi’s  method  required  450  and 
400  respectively. 

No  copies  of  converged  eigenvalues  appear  among  the  Ritz  values  when  semi-  or 
full  duality  is  maintained,  but  copies  do  appear  among  the  Ritz  values  computed  by 
LanLD. 

LanCor  takes  16  correction  steps  to  compute  the  requested  eigenvalues,  1  /25th  as 
many  correction  steps  as  LanFRB.  The  table  below  gives  the  last  9  steps  at  which 
the  duality  loss  is  corrected  in  LanCor  and  the  number  of  converged  Ritz  values  at 
that  step. 


Correction  step 

238 

264 

283 

295 

310 

323 

333 

347 

363 

No.  Eigenvalues 

14 

22 

26 

28 

34 

38 

42 

48 

56 

For  comparison  we  applied  Arnoldi’s  method  with  Modified  Gram- Schmidt  or- 
thogonalization  to  this  task  and  computed  eigenvalues  of  B  to  the  same  accuracy 
[26].  The  number  of  converged  Ritz  values  near  the  end  of  the  run  are  displayed 
below. 
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j  Arnoldi  step 

50 

100  150  200 

250 

300 

350 

400 

[  No.  Eigenvalues 

0 

0  0  6 

16 

30 

48 

70 

In  this  example  Arnoldi’s  method,  LanFRB,  and  LanCor  yield  approximate  eigen¬ 
values  of  similar  accuracy  for  a  fixed  number  of  steps;  the  differences  in  the  number 
of  steps  is  due  to  the  convergence  criteria. 

LanFRB,  LanCor,  and  LanLD  were  each  stable  in  the  sense  that 

||fr1T||1«35||R||2, 

even  though  min  |a.’,|  ps  2e  —  5  (see  §2.3). 

The  number  of  floating  point  operations  (flops)  in  these  Lanczos  and  Arnoldi  runs 
are  tabulated  below.  The  flop  count  for  applying  the  operator  (  a  sparse  matrix  vector 
multiplication  in  this  case  )  is  given  in  the  column  OP  below,  the  column  EIG  displays 
the  flop  count  for  solving  the  reduced  eigenvalue  problems  by  the  QR  algorithm  for 
Arnoldi ’s  method,  or  dqds,  an  algorithm  that  exploits  the  tridiagonal  structure,  for 
the  Lanczos-based  procedures  [8].  (BI-)ORTH  gives  the  flop  count  for  maintaining 
the  duality  or  orthogonality  of  the  basis  vectors,  and  the  the  column  ALGO  contains 
the  remaining  flop  count.  The  number  of  steps  required  by  each  method  is  given  in 
parenthesis  below  the  algorithm  name. 


Results  for  Balanced  Tolosa 


FLOPS 

(logic) 

OP 

EIG 

(BI-) 

ORTH 

ALGO 

TOTAL 

Arnoldi 

(400) 

6.6 

9.5 

8.8 

6.9 

9.6  j 

LanFRB 

(400) 

7.0 

7.7 

9.1 

7.7 

9.1 

LanCor 

(363) 

7.0 

7.8 

8.1 

7.8 

8.4 

LanLD 

(450) 

7.0 

7.8 

None 

8.0 

8.2 

We  were  surprised  at  the  large  number  of  flops  required  by  the  QR  algorithm  in 
Arnoldi’s  method  in  this  example.  For  comparison  note  that  computing  the  eigenval¬ 
ues  of  B  by  the  QR  algorithm  requires  le  +  11  flops. 

Maintaining  semi-duality  requires  an  order  of  magnitude  fewer  flops  than  full 
duality.  Also  LanCor  has  an  order  of  magnitude  fewer  flops  than  Arnoldi’s  method. 

LanLD  requires  the  fewest  flops  and  takes  the  most  steps.  The  low  flop  count  is 
due  to  the  less  rigorous  stopping  criteria.  The  error  introduced  by  accepting  a  Ritz 
value  as  an  eigenvalue  is  estimated  by  the  smaller  of  three  quantities:  the  distance 
from  the  Ritz  value  to  the  nearest  remaining  Ritz  value,  the  left,  and  the  right  un¬ 
normalized  residuals.  Recall  from  §2.2  that  if  v  is  an  eigenvector  of  VL~lT,  then  Qv  is 
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Figure  4:  Numerical  duality  for  LanCor  applied  to  Tolosa  matrix 


used  to  approximate  eigenvector  of  B ,  and  reliable  accuracy  estimates  must  factor  in 
the  shrinkage  HCMh/IMh-  We  observed  shrinkage,  i.e.  HQHI2/IMI2  ~  -0^  f°r  all  the 
Ritz  vectors  of  interest  in  this  example.  For  this  reason  the  error  estimates  based  on 
un-normalized  Ritz  vectors  are  100  times  too  small.  In  LanLD  the  Lanczos  vectors 
are  not  stored  and  so  the  shrinkage  of  the  Ritz  vectors  is  not  available.  Even  if  the 
Lanczos  vectors  are  stored,  forming  the  eigenvectors  requires  le  +  8.6  real  floating 
point  operations  (see  §2.2).  That  is,  if  we  demand  reliability  from  LanLD  similar  to 
that  of  LanCor.  the  LanLD  flop  count  will  increase  above  the  LanCor  flop  count. 

We  conclude  by  illustrating  the  effectiveness  of  the  duality  monitoring  algorithm 
of  §4.4.1.  Figure  4  compares  the  log10  of  our  estimate  of 

max(||  |n,|-I'2a'?,+,||.,ll  |n,|-I/2«.-p,+il|1)  (1) 

at  each  step  (dash-dot  line),  z,  to  the  exact  value  (solid  line).  Each  spike  indicates  a 
correction  step.  The  dotted  line  across  the  top  of  the  figure  is  the  target  threshold 
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